Journal of Computational Chemistry — Latest Matching Preprints

1

Bayesian Maximum Entropy Ensemble Refinement

Eltzner, B.; Hofstadler, J.; Rudolf, D.; Habeck, M.; de Groot, B.

2023-09-15 bioinformatics 10.1101/2023.09.12.557310 medRxiv

Top 0.1%

18.9%

Show abstract

The principle of maximum entropy provides a canonical way to include measurement results into a thermodynamic ensemble. Observable features of a thermodynamic system, which are measured as averages over an ensemble are included into the partition function by using Lagrange multipliers. Applying this principle to the systems energy leads to the well-known exponential form of the Boltzmann probability density. Here, we present a Bayesian approach to the estimation of maximum entropy parameters from nuclear Overhauser effect measurements in order to achieve a refined ensemble in molecular dynamics simulations. To achieve this goal, we leverage advances in the treatment of doubly intractable Bayesian inference problems by adaptive Markov Chain Monte Carlo methods. We illustrate the properties and viability of our method for alanine dipeptide as a simple model system and trp-cage as an example for a more complex peptide.

2

On quantum computing and geometry optimization

Malik, A. J.; Verma, C. S.

2023-03-20 bioinformatics 10.1101/2023.03.16.532929 medRxiv

Top 0.1%

10.0%

Show abstract

Quantum computers have demonstrated advantage in tackling problems considered hard for classical computers and hold promise for tackling complex problems in molecular mechanics such as mapping the conformational landscapes of biomolecules. This work attempts to explore a few ways in which classical data, relating to the Cartesian space representation of biomolecules, can be encoded for interaction with empirical quantum circuits not demonstrating quantum advantage. Using the quantum circuit in a variational arrangement together with a classical optimizer, this work deals with the optimization of spatial geometries with potential application to molecular assemblies. Additionally this work uses quantum machine learning for protein side-chain rotamer classification and uses an empirical quantum circuit for random state generation for Monte Carlo simulation for side-chain conformation sampling. Altogether, this novel work suggests ways of bridging the gap between conventional problems in life sciences and how potential solutions can be obtained using quantum computers. It is hoped that this work will provide the necessary impetus for wide-scale adoption of quantum computing in life sciences.

3

Linearised loop kinematics to study pathways between conformations

Hoevenaars, A. G. L.; Andre, I.

2021-04-11 bioinformatics 10.1101/2021.04.11.439310 medRxiv

Top 0.1%

10.0%

Show abstract

AO_SCPLOWBSTRACTC_SCPLOWConformational changes are central to the function of many proteins. Characterization of these changes using molecular simulation requires methods to effectively sample pathways between protein conformational states. In this paper we present an iterative algorithm that samples conformational transitions in protein loops, referred to as the Jacobian-based Loop Transition (JaLT) algorithm. The method uses internal coordinates to minimise the sampling space, while Cartesian coordinates are used to maintain loop closure. Information from the two representations is combined to push sampling towards a desired target conformation. The innovation that enables the simultaneous use of Cartesian coordinates and internal coordinate is the linearisation of the inverse kinematics of a protein backbone. The algorithm uses the Rosetta all-atom energy function to steer sampling through low-energy regions and uses Rosettas side-chain energy minimiser to update side-chain conformations along the way. Because the JaLT algorithm combines a detailed energy function with a low-dimensional conformational space, it is positioned in between molecular dynamics (MD) and elastic network model (ENM) methods. As a proof of principle, we apply the JaLT algorithm to study the conformational transition between the open and occluded state in the MET20 loop of the Escherichia coli dihydrofolate reductase enzyme. Our results show that the algorithm generates semi-continuous pathways between the two states with realistic energy profiles. These pathways can be used to identify energy barriers along the transition. The effect of a single point mutation of the MET20 loop was also investigated and the predicted increase in energy barrier is consistent with the experimentally observed reduction in catalytic rate of the enzyme. Additionally, it is demonstrated how the JaLT algorithm can be used to identify dominant degrees of freedom during a transition. This can be valuable input for a more extensive characterization of the free energy pathway along a transition using molecular dynamics, which is often performed with a reduced set of degrees of freedom. This study has thereby provided the first examples of how linearisation of inverse kinematics can be applied to the analysis of proteins.

4

Continuous B- to A- Transition in Protein-DNA Binding - How Well Is It Described by Current AMBER Force Fields?

Jurecka, P.; Zgarbova, M.; Cerny, F.; Salomon, J.

2022-01-13 biophysics 10.1101/2022.01.13.476176 medRxiv

Top 0.1%

9.7%

Show abstract

When DNA interacts with a protein, its structure often undergoes significant conformational adaptation. Perhaps the most common is the transition from canonical B-DNA towards the A-DNA form, which is not a two-state, but rather a continuous transition. The A- and B-forms differ mainly in sugar pucker P (north/south) and glycosidic torsion {chi} (high-anti/anti). The combination of A-like P and B-like {chi} (and vice versa) represents the nature of the intermediate states lying between the pure A- and B- forms. In this work, we study how the A/B equilibrium and in particular the A/B intermediate states, which are known to be over-represented at protein-DNA interfaces, are modeled by current AMBER force fields. Eight protein-DNA complexes and their naked (unbound) DNAs were simulated with OL15 and bsc1 force fields as well as an experimental combination OL15{chi}OL3. We found that while the geometries of the A-like intermediate states in the molecular dynamics (MD) simulations agree well with the native X-ray geometries found in the protein-DNA complexes, their populations (stabilities) are significantly underestimated. Different force fields predict different propensities for A-like states growing in the order OL15 < bsc1 < OL15{chi}OL3, but the overall populations of the A-like form are too low in all of them. Interestingly, the force fields seem to predict the correct sequence-dependent A-form propensity, as they predict larger populations of the A-like form in naked (unbound) DNA in those steps that acquire A-like conformations in protein-DNA complexes. The instability of A-like geometries in current force fields may significantly alter the geometry of the simulated protein-DNA complex, destabilize the binding motif, and reduce the binding energy, suggesting that refinement is needed to improve description of protein-DNA interactions in AMBER force fields.

5

Why Many Molecular Simulation Research Findings Might Be False: An Analysis of Inter-Simulations Differences Based on Simulation Time and Number of Replicas

Knapp, B.; Deane, C. M.

2022-08-25 bioinformatics 10.1101/2022.08.23.504950 medRxiv

Top 0.1%

9.7%

Show abstract

Molecular simulations are a common technique to investigate the dynamics of proteins, DNA and RNA. A typical application is the simulation of a wild-type structure and a mutant structure where the mutant has a significantly higher (or lower) potency to trigger a signalling cascade. The study would then analyse the observed differences between the wild-type and mutant simulation and link these to their differences. However differences in the simulations cannot always be reproduced by other research groups even if the same parameters as in the original simulations are used. This is caused by the rugged energy landscape of many biological structures which means that minor differences in hardware or software can cause simulation to take different paths. This would not be a problem if the simulation time would be infinitely long but in real life the simulation time is always finite. In this study we use large scale molecular simulations of four different systems (a 10-mer peptide wild-type and mutant as well as a T-cell receptor, peptide and MHC complex as wild-type and mutant) with 100 replicas each totalling 620 000 ns to quantify the magnitude of (non-) reproducibility when comparing inter-simulation differences (e.g. wild-type vs mutant). Using a bootstrapping approach we found that simulation times of at least 2 to 3 times the experimental folding time using a minimum of 3 replicas are necessary for reproducible results. However, for most complexes of interest such long simulation times are far out of reach which means that it is only possible to sample the local phase space neighbourhood of the x-ray structure. To sample this neighbourhood reliably around 10 to 20 replicas are needed. Graphical abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=122 SRC="FIGDIR/small/504950v1_ufig1.gif" ALT="Figure 1"> View larger version (19K): org.highwire.dtl.DTLVardef@530c47org.highwire.dtl.DTLVardef@4b1aeborg.highwire.dtl.DTLVardef@d47906org.highwire.dtl.DTLVardef@155a679_HPS_FORMAT_FIGEXP M_FIG C_FIG

6

Integrated Database of Force-Field Parameters, Experimental Measurements and Molecular Dynamics Simulations

Banas, P.; Mlynsky, V.; Ciz, D.; Furmanek, R.; Pilat, N.; Pauw, V.; Hachinger, S.; Sponer, J.; Martinovic, J.; Otyepka, M.

2024-12-04 bioinformatics 10.1101/2024.12.03.626554 medRxiv

Top 0.1%

9.6%

Show abstract

Molecular Dynamic (MD) simulation is a vital theoretical tool for exploring nucleic acids (RNA, DNA), proteins and other (bio)molecular systems, generating vast amounts of data daily. Efficient storage and possible reuse of this data is a persistent challenge. Here, we introduce IDA (Integrated DAtabase of force fields and datasets from experiments and MD simulations), an innovative database scheme for datasets from various types of MD simulations. IDA supports outputs from different MD approaches, i.e., standard MD simulations, importance sampling techniques, simulated annealing, and other enhanced sampling methods including replica-exchange simulations. IDA also houses a collection of molecule-specific force fields (FFs) and experimental datasets. Uploaded MD outputs, FFs, and experimental data are integrated in a standardized format, allowing efficient data mining and extraction of valuable insights from the extensive data generated by diverse MD simulations. With the data and metadata holdings of IDA, and the prospective assignment of persistent identifiers, our work aims to make key steps towards making MD data FAIR (findable, accessible, interoperable, reusable).

7

Generalizable Quantum Computing Pipeline for Real World Drug Discovery

Li, W.; Yin, Z.; Li, X.; Ma, D.; Zhang, Z.; Zou, C.; Bu, K.; Dai, M.; Yue, J.; Chen, Y.; Zhang, X.; Zhang, S.

2024-01-09 bioinformatics 10.1101/2024.01.08.574600 medRxiv

Top 0.1%

8.1%

Show abstract

Quantum computing, with its superior computational capabilities compared to classical approaches, holds the potential to revolutionize numerous scientific domains, including pharmaceuticals. However, the application of quantum computing for drug discovery has primarily been limited to proof-of-concept studies, which often fail to capture the intricacies of real-world drug development challenges. In this study, we diverge from conventional investigations by developing an advanced quantum computing pipeline tailored to address genuine drug design problems. Our approach underscores the pragmatic application of quantum computation and propels it towards practical industrial adoption. We specifically construct our versatile quantum computing pipeline to address two critical tasks in drug discovery: the precise determination of Gibbs free energy profiles for prodrug activation involving covalent bond cleavage, and the accurate simulation of covalent bond interactions. This work serves as a pioneering effort in benchmarking quantum computing against veritable scenarios encountered in drug design, especially the covalent bonding issue present in both of the case studies, thereby transitioning from theoretical models to tangible applications. Our results demonstrate the potential of a quantum computing pipeline for integration into real world drug design workflows.

8

STORMM: Structure and TOpology Replica Molecular Mechanics for chemical simulations

Cerutti, D. S.; Boothroyd, S.; Wiewiora, R.; Sherman, W.

2024-03-28 biophysics 10.1101/2024.03.27.587048 medRxiv

Top 0.1%

7.9%

Show abstract

The Structure and TOpology Replica Molecular Mechanics (STORMM) code is a next-generation molecular simulation engine and associated libraries optimized for performance on fast, multicore central processor units (CPUs) and graphics processing units (GPUs) with independent memory and tens of thousands of threads. STORMM is built to run thousands of independent molecular mechanical calculations on a single GPU with novel implementations that optimize numerical precision, mathematical operations, throughput, and resource management. The libraries are built around accessible classes with detailed documentation, supporting fine-grained parallelism and algorithm development as well as macroscopic manipulations of groups of systems on and off of the GPU. A primary intention of the STORMM libraries is to provide developers of atomic simulation methods with access to a high-performance molecular mechanics engine with extensive facilities to prototype and develop bespoke tools aimed toward drug discovery applications. In its present state, STORMM delivers molecular dynamics simulations of small molecules and small proteins in implicit solvent with tens to hundreds of times the throughput of conventional codes. The engineering paradigm also transforms two of the most memory bandwidth-intensive aspects of condensed-phase dynamics, particle-mesh mapping and valence interactions, into compute-bound problems for several times the scalability of existing programs. Numerical methods for getting the most out of each bit of information present in stored coordinates and lookup tables are also presented, delivering improved accuracy over methods implemented in other molecular dynamics engines. The open-source code is released under the MIT license.

9

Easy Removal of Steric Clashes and Entanglements in Macromolecular Systems by Temporary Addition of a Fourth Spatial Dimension

Elcock, A. H.

2023-04-28 biophysics 10.1101/2023.04.26.537866 medRxiv

Top 0.1%

7.3%

Show abstract

When models of complicated macromolecular systems are constructed, it is common to inadvertently include either gross steric clashes or entanglements of extended loop regions. Removing these problems with conventional energy minimization or dynamics algorithms can often be difficult. Here I show that one easy alternative is to temporarily add an extra spatial dimension and to displace atoms or molecules along this fourth dimension such that the distances between atoms, when measured in 4D, are no longer considered clashing. Adding in half-harmonic potential functions to mimic walls in this 4th dimension, and then moving these walls toward each other, has the effect of decreasing the space available in the 4th dimension and drives atoms to avoid each other in 3D. I illustrate the method with three examples: two showing how interlocked ring polymers can be easily disentangled from each other in both 2D and 3D, and one showing how ten identical coarse-grained protein models, all placed at the same point in 3D space, can be separated from each other, without distorting their structures, during the course of a single energy minimization. A sample program implementing the method is available that can be easily adapted to other situations.

10

Glycine molecule radical: Predicted properties and dipeptide formation

Synak, J.; Blazewicz, J.

2026-07-10 bioinformatics 10.64898/2026.07.07.736934 medRxiv

Top 0.1%

6.8%

Show abstract

Numerous advances in quantum and computational chemistry over the last decades, well as the development of computer science, allowed utilisation of more precise and complex models, which can be now applied to much bigger systems than in the past. The authors used Gaussian, coupled with theoretical methods, to predict a new way of peptide bond formation, which could have taken place in prebiotic conditions. To better tackle this difficult task, the properties of substrates (glycine-derived radicals) were extensively analysed, using the aforementioned tool - Gaussian, paired with taking resonance and hybridisation into account, to better understand the stereochemistry and the very nature of processes taking place. The result is a series of reactions, which without any sophisticated catalysts and with relatively low energy thresholds ({inverted exclamation}20 kcal/mol) can lead to formation of dipeptides (and further, oligopeptides). The authors also hope, the other predicted properties of the investigated molecules can be of use to any researcher, who would like to utilise them in their experiments. Author summaryOur goal was to investigate a way first peptide bonds in prebiotic conditions could have been formed. This is an extremely important step in research into the beginning of life on Earth. We found a very promising series of reactions, which uses atomic hydrogen as its only catalyst and confirmed our expectations with theoretical calculations, using Gaussian. There are two radicals derived from glycine, which perform major roles in the process, so we investigated their properties with Gaussian and verified that the results are in agreement with our own theoretical considerations. This involved checking for possible geometric isomers and conformers and creating models which could explain their properties. We are well aware that such calculations have limitations and there is no model, which is 100% accurate, so our results should be further confirmed by empirical data in the future. However, we still to be as thorough as possible in how we approached the subject.

11

Is Tanimoto a metric?

Surendran, A.; Zsigmond, K.; Lopez Perez, K.; Miranda Quintana, R. A.

2025-02-23 bioinformatics 10.1101/2025.02.18.638904 medRxiv

Top 0.1%

6.8%

Show abstract

No. However, here we show how to generate a metric consistent with the Tanimoto similarity. We also explore new properties of this index, and how it relates to other popular alternatives.

12

Combining statistical and neural network approaches to derive energy functions for completely flexible protein backbone design

Huang, B.; Xu, Y.; Liu, H.

2019-06-18 bioinformatics 10.1101/673897 medRxiv

Top 0.1%

6.8%

Show abstract

A designable protein backbone is one for which amino acid sequences that stably fold into it exist. To design such backbones, a general method is much needed for continuous sampling and optimization in the backbone conformational space without specific amino acid sequence information. The energy functions driving such sampling and optimization must faithfully recapitulate the characteristically coupled distributions of multiplexes of local and non-local conformational variables in designable backbones. It is also desired that the energy surfaces are continuous and smooth, with easily computable gradients. We combine statistical and neural network (NN) approaches to derive a model named SCUBA, standing for Side-Chain-Unspecialized-Backbone-Arrangement. In this approach, high-dimensional statistical energy surfaces learned from known protein structures are analytically represented as NNs. SCUBA is composed as a sum of NN terms describing local and non-local conformational energies, each NN term derived by first estimating the statistical energies in the corresponding multi-variable space via neighbor-counting (NC) with adaptive cutoffs, and then training the NN with the NC-estimated energies. To determine the relative weights of different energy terms, SCUBA-driven stochastic dynamics (SD) simulations of natural proteins are considered. As initial computational tests of SCUBA, we apply SD simulated annealing to automatically optimize artificially constructed polypeptide backbones of different fold classes. For a majority of the resulting backbones, structurally matching native backbones can be found with Dali Z-scores above 6 and less than 2 [A] displacements of main chain atoms in aligned secondary structures. The results suggest that SCUBA-driven sampling and optimization can be a general tool for protein backbone design with complete conformational flexibility. In addition, the NC-NN approach can be generally applied to develop continuous, noise-filtered multi-variable statistical models from structural data.\n\nLinux executables to setup and run SCUBA SD simulations are publicly available (http://biocomp.ustc.edu.cn/servers/download_scuba.php). Interested readers may contact the authors for source code availability.

13

A Quantum Lens on Molecular Design: A Machine-Learned Energy Function from Interacting Quantum Atoms.

Hoffmann, M.; Kazimir, A.; Oesterreich, T.; Kaermer, L.; Engelberger, F.; Meiler, J.; Lamers, C.

2026-03-05 bioinformatics 10.64898/2026.03.03.709242 medRxiv

Top 0.1%

6.4%

Show abstract

Accurate predictions of the interactions (covalent bonds and non-covalent contacts between atoms) in a molecular system require scalable, accurate, and interpretable energy functions. While classical force fields and knowledge-based energy functions struggle to capture key electronic effects, quantum chemistry approaches such as density functional theory (DFT) provide the necessary accuracy but remain computationally demanding. Furthermore, gaining insight into interactions requires energy decomposition schemes. The Interacting Quantum Atoms (IQA) scheme is exceptionally attractive, offering a chemically intuitive, electron density (ED) topologically based separation into intra- and interatomic contributions, however its high computational cost remains a significant barrier for application to larger systems or tasks like ligand screening in drug discovery. We address these limitations by introducing a novel machine learning (ML) framework to predict accurate energies derived from the IQA scheme together with a comprehensive dataset of molecular systems and their calculated IQA decomposed energies. It enables the rapid and accurate prediction of DFT single point energies and dissects these energies in a physically meaningful and chemically intuitive manner. Our method predicts all intra-atomic energies and inter-atomic interaction energies (covalent and non-covalent) within a defined distance cutoff, providing an energy function that decomposes the total energy into specific atomic contributions. This advance makes the IQA method viable for analyzing interaction energies in applications previously inaccessible due to computational expense, such as elucidating ligand-binding mechanisms and informing rational drug design.

14

A Compact Matrix Representation for Hydrocarbon and Its Applications on Monoterpene Carbocation Rearrangement Elementary Steps

Tse, T.

2025-08-28 bioinformatics 10.1101/2025.08.23.671925 medRxiv

Top 0.1%

6.4%

Show abstract

This work proposes a compact matrix-based representation for hydrocarbon carbocations ([CxHy]m+). In addition, its application is explored for the elementary steps of non-stereoisomeric [C10H17]+ rearrangement.

15

A Hierarchical Method to Analyze Protein-DNA Interfaces

Tagad, A.; Patwari, G. N.

2024-07-19 bioinformatics 10.1101/2024.07.18.604047 medRxiv

Top 0.1%

6.3%

Show abstract

The accessibility of genetic information and packaging of long chromosomal DNA into micron-sized nuclei is a great example of protein-DNA interaction. A large number of protein-DNA structures are available in the database and is continuously increasing. The analysis of such huge data to extract meaningful insights, several computational tools such as hydrogen bonding, SASA, all intermolecular heavy atom-to-atom contacts, as well as web-server and databases such as DNAproDB, PDIdb, PDB2PQR, Bio-python are available. Generally, the interaction between the protein and DNA is analyzed based on all heavy atom-to-atom contact matrices which are computationally expensive. Herein, a new and robust hierarchical approach is developed to analyze the protein-DNA interface. The present hierarchical method comprises two steps; the first step is to recognize the protein residue-nucleotide pairs at the interface by employing pairwise distance cut-off between the C of the protein residues and 05' of the nucleotide and the second step is to calculate heavy atom-to-atom contact matrix using the first step as a qualifier. This method reduces the computational cost by three orders of magnitude making it tractable even on personal computers. On the whole, the protein-DNA interface is dominated by the arginine residues with a notable presence of lysine, serine, and tyrosine, highlighting the pivotal role of electrostatic interactions and hydrogen bonding in aggregation. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=89 SRC="FIGDIR/small/604047v1_ufig1.gif" ALT="Figure 1"> View larger version (49K): org.highwire.dtl.DTLVardef@5c3dc1org.highwire.dtl.DTLVardef@1da4a16org.highwire.dtl.DTLVardef@35fecaorg.highwire.dtl.DTLVardef@eab746_HPS_FORMAT_FIGEXP M_FIG C_FIG

16

Per-residue optimisation of protein structures:Rapid alternative to optimisation with constrainedalpha carbons

Schindler, O.; Bucekova, G.; Svoboda, T.; Svobodova, R.

2025-11-26 bioinformatics 10.1101/2025.11.24.690085 medRxiv

Top 0.1%

6.2%

Show abstract

In recent years, the number of known protein structures has increased significantly. Predictive algorithms and experimental methods provide the positions of protein residues relative to each other with high accuracy. However, the local quality of the protein structure, including bond lengths, angles, and positions of individual atoms, often lacks the same level of precision. For this reason, protein structures are usually optimised by a force field prior to their application in further research sensitive to structural quality. Protein structure optimisation, however, is computationally challenging. In this paper, we introduce a general method Per-residue optimisation of protein structures: Rapid alternative to optimisation with constrained alpha carbons (PROPTIMUS RAPHAN). Rather than optimising the entire protein structure at once, PROPTIMUS RAPHAN divides the structure into overlapping residual substructures and optimises each substructure individually. This approach results in computational time that scales linearly with the size of the structure. Additionally, we present PROPTIMUS RAPHANGFN-FF, a reference implementation of our method employing a generic, almost QM-accurate force field, GFN-FF. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=115 SRC="FIGDIR/small/690085v2_ufig1.gif" ALT="Figure 1"> View larger version (33K): org.highwire.dtl.DTLVardef@1e9fc5borg.highwire.dtl.DTLVardef@b6d70aorg.highwire.dtl.DTLVardef@1e00427org.highwire.dtl.DTLVardef@30c66f_HPS_FORMAT_FIGEXP M_FIG C_FIG We tested PROPTIMUS RAPHANGFN-FF on 461 AlphaFold DB structures and demonstrated that our approach achieves results comparable to the optimisation of the structure with constrained alpha carbons in significantly less time. Scientific ContributionThe main contribution of this work is the PROPTI-MUS RAPHAN method and its reference parallelisable implementation PROP-TIMUS RAPHANGFN-FF. Because the time requirement increases linearly with the size of the structure, PROPTIMUS RAPHANGFN-FF optimises on average 5 000 atoms per hour and a common CPU. Therefore, prior to any research sensitive to protein structure quality, our method can be employed to obtain protein structures closer to QM-accuracy.

17

Fast and Accurate Estimation of Gas-Phase Entropy from the Molecular Surface Curvature

Venkatraman, V.; Roy, A.

2021-05-27 bioinformatics 10.1101/2021.05.26.445640 medRxiv

Top 0.1%

5.7%

Show abstract

Estimating entropy is crucial for understanding and modifying biological systems, such as protein-ligand binding. Current computational methods to estimate entropy require extensive, or at times prohibitively extensive, computational resources. This article presents SHAPE (SHape-based Accurate Predictor of Entropy), a new method that estimates the gas-phase entropy of small molecules purely from their surface geometry. The gas-phase entropy of small molecules can be computed in {approx}0.01 CPU hours with run time complexity of [Formula], where Na is the number of atoms. The accuracy of SHAPE is within 1 - 2% of computationally expensive quantum mechanical or molecular mechanical calculations. We further show that the inclusion of gas-phase entropy, estimated using SHAPE, improves the rank-order correlation between binding affinity and binding score from 0.18 to 0.40. The speed and accuracy of SHAPE make it well-suited for inclusion in molecular docking or QSAR (quantitative structure-activity relationships) methods. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=112 SRC="FIGDIR/small/445640v1_ufig1.gif" ALT="Figure 1"> View larger version (34K): org.highwire.dtl.DTLVardef@1cc10d1org.highwire.dtl.DTLVardef@6448d6org.highwire.dtl.DTLVardef@1e9e308org.highwire.dtl.DTLVardef@385eb2_HPS_FORMAT_FIGEXP M_FIG C_FIG

18

Systematic Data-Driven Penalty Calibration for Constrained Quantum Optimization with Application to Molecular Docking

Mukherjee, P.; Mandal, S.

2026-01-30 bioinformatics 10.64898/2026.01.27.699805 medRxiv

Top 0.1%

5.7%

Show abstract

This paper describes MMP, a three-stage framework for systematic quantum optimization of constrained molecular docking problems. The protocol addresses the "formulation bottleneck"--the critical challenge of translating constrained optimization problems into valid QUBO (Quadratic Unconstrained Binary Optimization) formulations for quantum solvers. MMP replaces heuristic penalty tuning with data-driven calibration through: (1) classical solution-space analysis to validate fragment libraries before quantum deployment, (2) systematic penalty sweeps to identify optimal "Goldilocks Zone" coefficients, and (3) MAC-QAOA (MMP Adaptive Constraint QAOA) with layer-dependent penalty decay. Preliminary benchmarks on synthetic constrained optimization problems demonstrate 99.7% solution validity at identified elbow points and 25.5% improvement in solution quality over static-penalty QAOA. MMP is hardware-agnostic but designed for near-term devices including Pasqals Orion Gamma (140+ qubits). The theoretical framework, algorithmic details, and preliminary validation results of the protocol are discussed, establishing a systematic methodology for quantum-augmented optimization workflows for drug discovery. All benchmarks are conducted on synthetic constrained optimization instances that reproduce structural features of docking formulations; application to real molecular docking targets is left for future work.

19

Lira: Rotational Invariant Shape and ElectrostaticDescriptors for Small Molecules and ProteinPockets based on Real Spherical Harmonics

Caires, F. R.; Silva, S. R.; Verissimo-Alves, M.; Pinheiro, V. B.; Montalvao, R. W.

2022-01-21 bioinformatics 10.1101/2022.01.19.476747 medRxiv

Top 0.1%

5.6%

Show abstract

AO_SCPLOWBSTRACTC_SCPLOWO_ST_ABSMotivationC_ST_ABSModern AI-based tools are increasing the number of protein structures available, creating an opportunity and a challenge for automated high-throughput drug discovery pipelines. The amount of data is overwhelming for the current methods, thus demanding new high-performance approaches for Machine Learning-based rational drug design. As shape and electrostatics are the main components for understanding protein-ligand interaction; they are the primary targets for efficient AI-compatible descriptors and their associated comparison methods. ResultsThe Lira toolbox is a set of components devised for describing, comparing and analysing shape and electrostatics for small ligands, peptides and protein pockets. It can generate databases with descriptors for tens of millions of shapes in a few hours, which can then be queried in seconds. The Lira design, focused on performance and reliability, makes its integration into AI-driven rational drug design pipelines simple. Availability and implementationLira packages, available for download at https://pinheirolab.com/, are free to use for research and educational purposes.

20

Across atoms to crossing continents: Application of similarity measures to biological location data

Schuhmann, F.; Ryvkin, L.; McLaren, J. D.; Gerhards, L.; Solov'yov, I. A.

2022-06-22 bioinformatics 10.1101/2022.06.20.496870 medRxiv

Top 0.1%

5.6%

Show abstract

Biological processes involve movements across all measurable scales. Similarity measures can be applied to compare and analyze these movements but differ in how differences in movement are aggregated across space and time. The present study reviews frequently-used similarity measures, such as the Hausdorff distance, Frechet distance, Dynamic Time Warping, and Longest Common Subsequence, jointly with several measures less used in biological applications (Wasserstein distance, weak Frechet distance, and Kullback-Leibler divergence), and provides computational tools for each of them that may be used in computational biology. We illustrate the use of the selected similarity measures in diagnosing differences within two extremely contrasting sets of biological data, which, remarkably, may both be relevant for magnetic field perception by migratory birds. Specifically, we assess and discuss cryptochrome protein conformational dynamics and extreme migratory trajectories of songbirds between Alaska and Africa. We highlight how similarity measures contrast regarding computational complexity and discuss those which can be useful in noise elimination or, conversely, are sensitive to spatiotemporal scales.